Skip to content

feat: add OpenAI diarization support#651

Open
8times4 wants to merge 4 commits into
TanStack:mainfrom
8times4:feat/openai-transcription-diarization
Open

feat: add OpenAI diarization support#651
8times4 wants to merge 4 commits into
TanStack:mainfrom
8times4:feat/openai-transcription-diarization

Conversation

@8times4

@8times4 8times4 commented May 27, 2026

Copy link
Copy Markdown

🎯 Changes

This change adds diarization support for OpenAI's gpt-4o-transcribe-diarize model, based on https://developers.openai.com/api/docs/guides/speech-to-text?lang=javascript

✅ Checklist

  • I have followed the steps in the Contributing guide.
  • I have tested this code locally with pnpm run test:pr.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

  • New Features

    • Added speaker diarization for OpenAI transcriptions with automatic speaker labeling and segment-level timestamps
    • New diarized JSON response format for structured, speaker-labeled transcripts
    • Transcription API now supports GPT-4o diarize model alongside existing models
  • Documentation

    • Updated transcription docs, examples, and best practices with diarization usage, options, and constraints

@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds OpenAI speaker diarization support for transcription, enabling speaker-labeled segment output via the gpt-4o-transcribe-diarize model. It introduces type contracts for diarized_json format, extends provider options with diarization parameters, implements model detection and validation in the adapter, adds comprehensive tests, wires diarization into E2E infrastructure and examples, and documents the feature across guides and API references.

Changes

OpenAI Transcription Diarization Feature

Layer / File(s) Summary
TranscriptionResponseFormat type and type imports
packages/ai/src/types.ts, packages/ai-client/src/generation-types.ts, packages/ai/src/activities/generateTranscription/index.ts
Introduces TranscriptionResponseFormat type alias enumerating supported formats and updates TranscriptionOptions, TranscriptionGenerateInput, and TranscriptionActivityOptions to reference it, enabling diarized_json format across the SDK.
OpenAI provider option types
packages/ai-openai/src/audio/transcription-provider-options.ts
Introduces OpenAITranscriptionResponseFormat union type and extends OpenAITranscriptionProviderOptions with response_format, prompt, and chunking_strategy fields for diarization configuration.
OpenAI adapter diarization implementation
packages/ai-openai/src/adapters/transcription.ts
Detects diarization-capable models, validates diarization constraints (no prompt, no include, no timestamp_granularities, speaker count limits), auto-configures chunking_strategy: auto for diarization, maps diarized_json request/response, parses diarized segments with speaker labels into TranscriptionSegment[], and preserves non-diarized transcription backward compatibility.
Validation and format mapping
packages/ai-openai/src/adapters/transcription.ts
Adds validateDiarizationOptions and extends response-format mapping to include diarized_json, enforcing diarization-only constraints and validating speaker metadata.
Diarization adapter tests
packages/ai-openai/tests/transcription-adapter.test.ts
Comprehensive test suite verifying diarization defaults (response_format: diarized_json, chunking_strategy: auto), explicit option forwarding, null chunking_strategy handling, segment ID normalization, alternate response format acceptance, and validation errors for unsupported options, speaker metadata constraints, and model/feature mismatches.
E2E test harness and feature routing
testing/e2e/src/lib/types.ts, testing/e2e/src/lib/feature-support.ts, testing/e2e/src/lib/features.ts, testing/e2e/src/lib/media-providers.ts, testing/e2e/src/lib/server-functions.ts, testing/e2e/src/routes/$provider/$feature.tsx
Extends E2E test infrastructure with transcription-diarization feature type, marks OpenAI as the only supporting provider, configures media feature routing, adds feature parameter to adapter creation to select model variants, and extends server function schemas to accept responseFormat and modelOptions.
E2E UI and API route updates
testing/e2e/src/components/TranscriptionUI.tsx, testing/e2e/src/routes/api.transcription.ts, testing/e2e/src/routes/api.transcription.stream.ts
Updates TranscriptionUI to accept feature prop, conditionally build diarization modelOptions, render speaker labels in segments, and updates API routes to parse and forward responseFormat, modelOptions, and feature through transcription generation with feature-aware adapter creation.
E2E fixtures and diarization test suite
testing/e2e/fixtures/transcription/, testing/e2e/tests/transcription.spec.ts
Adds diarized transcription response fixture with speaker-labeled segments and end-to-end test assertions for segment text, speaker labels, and delivery modes (SSE, HTTP stream, fetcher).
Example app transcription provider types
examples/ts-react-chat/src/lib/audio-providers.ts
Introduces openai-diarize provider ID and extends TranscriptionProviderConfig with optional transcriptionOptions for provider-specific diarization settings.
Example app server functions and routing
examples/ts-react-chat/src/lib/server-audio-adapters.ts, examples/ts-react-chat/src/lib/server-fns.ts, examples/ts-react-chat/src/routes/api.transcribe.ts, examples/ts-react-chat/src/routes/generations.transcription.tsx
Extends server functions to accept responseFormat and modelOptions, adds openai-diarize provider routing to gpt-4o-transcribe-diarize model, and wires diarization parameters through transcription generation and UI in example application.
Knip config update
knip.json
Removes packages/ai-openai/src/audio/transcribe-provider-options.ts from Knip ignore list to enable unused export detection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • tombeckenham
  • AlemTuzlak
  • jherr

"🐰 I hopped through code with ears held high,
I labeled speakers as they spoke nearby,
Chunking set to auto, segments all align,
Diarized JSON makes each voice shine! 🎧✨"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add OpenAI diarization support' accurately and concisely describes the main feature addition—diarization support for OpenAI's transcription models.
Description check ✅ Passed The PR description follows the template structure, includes a clear summary of changes linking to OpenAI documentation, and has all checklist items properly addressed with required changeset confirmation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/ai-openai/src/adapters/transcription.ts`:
- Around line 267-285: The diarization validation is missing a local guard for
responseFormat: update validateDiarizationOptions (used by transcribe and
guarded by isDiarizeTranscriptionModel) to throw when
modelOptions.responseFormat (or the mapped value from mapResponseFormat) is not
one of the allowed values ["json","text","diarized_json"]; ensure transcribe()
cannot send srt/vtt/verbose_json for diarize models by checking
modelOptions.responseFormat (or resolved response format) early and throwing a
clear error stating diarization models only support json, text, and
diarized_json; reference validateDiarizationOptions, transcribe,
mapResponseFormat, and isDiarizeTranscriptionModel when applying the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7c4b4b31-fb90-4e00-9d8f-1454f513e089

📥 Commits

Reviewing files that changed from the base of the PR and between 5634f18 and a59d368.

📒 Files selected for processing (13)
  • .changeset/openai-transcription-diarization.md
  • docs/adapters/openai.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/media/generation-hooks.md
  • docs/media/transcription.md
  • docs/reference/interfaces/TranscriptionOptions.md
  • packages/ai-client/src/generation-types.ts
  • packages/ai-openai/src/adapters/transcription.ts
  • packages/ai-openai/src/audio/transcription-provider-options.ts
  • packages/ai-openai/tests/transcription-adapter.test.ts
  • packages/ai/skills/ai-core/media-generation/SKILL.md
  • packages/ai/src/activities/generateTranscription/index.ts
  • packages/ai/src/types.ts

Comment thread packages/ai-openai/src/adapters/transcription.ts
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Actionable comments posted: 0

@AlemTuzlak AlemTuzlak requested a review from tombeckenham June 3, 2026 14:45
@tombeckenham tombeckenham force-pushed the feat/openai-transcription-diarization branch from 05dfb53 to fbb57a0 Compare June 4, 2026 03:47
@nx-cloud

nx-cloud Bot commented Jun 4, 2026

Copy link
Copy Markdown

View your CI Pipeline Execution ↗ for commit 58aa20c

Command Status Duration Result
nx run-many --targets=build --exclude=examples/... ✅ Succeeded 55s View ↗

☁️ Nx Cloud last updated this comment at 2026-06-16 21:54:33 UTC

@pkg-pr-new

pkg-pr-new Bot commented Jun 4, 2026

Copy link
Copy Markdown

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/@tanstack/ai@651

@tanstack/ai-anthropic

npm i https://pkg.pr.new/@tanstack/ai-anthropic@651

@tanstack/ai-client

npm i https://pkg.pr.new/@tanstack/ai-client@651

@tanstack/ai-code-mode

npm i https://pkg.pr.new/@tanstack/ai-code-mode@651

@tanstack/ai-code-mode-skills

npm i https://pkg.pr.new/@tanstack/ai-code-mode-skills@651

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/@tanstack/ai-devtools-core@651

@tanstack/ai-elevenlabs

npm i https://pkg.pr.new/@tanstack/ai-elevenlabs@651

@tanstack/ai-event-client

npm i https://pkg.pr.new/@tanstack/ai-event-client@651

@tanstack/ai-fal

npm i https://pkg.pr.new/@tanstack/ai-fal@651

@tanstack/ai-gemini

npm i https://pkg.pr.new/@tanstack/ai-gemini@651

@tanstack/ai-grok

npm i https://pkg.pr.new/@tanstack/ai-grok@651

@tanstack/ai-groq

npm i https://pkg.pr.new/@tanstack/ai-groq@651

@tanstack/ai-isolate-cloudflare

npm i https://pkg.pr.new/@tanstack/ai-isolate-cloudflare@651

@tanstack/ai-isolate-node

npm i https://pkg.pr.new/@tanstack/ai-isolate-node@651

@tanstack/ai-isolate-quickjs

npm i https://pkg.pr.new/@tanstack/ai-isolate-quickjs@651

@tanstack/ai-mcp

npm i https://pkg.pr.new/@tanstack/ai-mcp@651

@tanstack/ai-ollama

npm i https://pkg.pr.new/@tanstack/ai-ollama@651

@tanstack/ai-openai

npm i https://pkg.pr.new/@tanstack/ai-openai@651

@tanstack/ai-openrouter

npm i https://pkg.pr.new/@tanstack/ai-openrouter@651

@tanstack/ai-preact

npm i https://pkg.pr.new/@tanstack/ai-preact@651

@tanstack/ai-react

npm i https://pkg.pr.new/@tanstack/ai-react@651

@tanstack/ai-react-ui

npm i https://pkg.pr.new/@tanstack/ai-react-ui@651

@tanstack/ai-solid

npm i https://pkg.pr.new/@tanstack/ai-solid@651

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/@tanstack/ai-solid-ui@651

@tanstack/ai-svelte

npm i https://pkg.pr.new/@tanstack/ai-svelte@651

@tanstack/ai-utils

npm i https://pkg.pr.new/@tanstack/ai-utils@651

@tanstack/ai-vue

npm i https://pkg.pr.new/@tanstack/ai-vue@651

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/@tanstack/ai-vue-ui@651

@tanstack/openai-base

npm i https://pkg.pr.new/@tanstack/openai-base@651

@tanstack/preact-ai-devtools

npm i https://pkg.pr.new/@tanstack/preact-ai-devtools@651

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/@tanstack/react-ai-devtools@651

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/@tanstack/solid-ai-devtools@651

commit: 58aa20c

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/transcription.md`:
- Line 561: The example hardcodes 'whisper-1' in the createOpenaiTranscription
call; update the docs to use the provider's latest transcription model constant
exported from the OpenAI adapter's model-meta.ts instead of a string literal.
Import or reference the exported latest-model symbol from that file (e.g., the
adapter's LATEST_* or DEFAULT_* transcription model constant) and pass that
symbol into createOpenaiTranscription so the docs always use the adapter-defined
current OpenAI transcription model.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2c455a0-25d6-4921-8f26-77965d2791be

📥 Commits

Reviewing files that changed from the base of the PR and between 05dfb53 and fbb57a0.

📒 Files selected for processing (6)
  • .changeset/openai-transcription-diarization.md
  • docs/adapters/openai.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/media/generation-hooks.md
  • docs/media/transcription.md
  • docs/reference/interfaces/TranscriptionOptions.md
✅ Files skipped from review due to trivial changes (5)
  • .changeset/openai-transcription-diarization.md
  • docs/media/generation-hooks.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/adapters/openai.md
  • docs/reference/interfaces/TranscriptionOptions.md

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/media/transcription.md`:
- Line 561: The example hardcodes 'whisper-1' in the createOpenaiTranscription
call; update the docs to use the provider's latest transcription model constant
exported from the OpenAI adapter's model-meta.ts instead of a string literal.
Import or reference the exported latest-model symbol from that file (e.g., the
adapter's LATEST_* or DEFAULT_* transcription model constant) and pass that
symbol into createOpenaiTranscription so the docs always use the adapter-defined
current OpenAI transcription model.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b2c455a0-25d6-4921-8f26-77965d2791be

📥 Commits

Reviewing files that changed from the base of the PR and between 05dfb53 and fbb57a0.

📒 Files selected for processing (6)
  • .changeset/openai-transcription-diarization.md
  • docs/adapters/openai.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/media/generation-hooks.md
  • docs/media/transcription.md
  • docs/reference/interfaces/TranscriptionOptions.md
✅ Files skipped from review due to trivial changes (5)
  • .changeset/openai-transcription-diarization.md
  • docs/media/generation-hooks.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/adapters/openai.md
  • docs/reference/interfaces/TranscriptionOptions.md
🛑 Comments failed to post (1)
docs/media/transcription.md (1)

561-561: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the provider’s latest OpenAI transcription model in this example.

This changed snippet still hardcodes whisper-1; please update it to the latest OpenAI transcription model defined in the adapter model-meta.ts to keep docs aligned with project policy.

As per coding guidelines: “Use the latest model per provider in documentation example code, sourced from each adapter's model-meta.ts (newest gpt-*, claude-*, gemini-*, …)”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/media/transcription.md` at line 561, The example hardcodes 'whisper-1'
in the createOpenaiTranscription call; update the docs to use the provider's
latest transcription model constant exported from the OpenAI adapter's
model-meta.ts instead of a string literal. Import or reference the exported
latest-model symbol from that file (e.g., the adapter's LATEST_* or DEFAULT_*
transcription model constant) and pass that symbol into
createOpenaiTranscription so the docs always use the adapter-defined current
OpenAI transcription model.

@tombeckenham

Copy link
Copy Markdown
Contributor

Hi @8times4, thank you for this. Would you be able to create an e2e test for this using aimock? The tests are in the e2e test package. Ideally, adding a way to see the results on one of the ts-react-chat example pages would be great as well

@tombeckenham

Copy link
Copy Markdown
Contributor

Code review

Found 3 issues:

  1. No E2E test coverage added for the diarization feature/behavior change (new gpt-4o-transcribe-diarize model, diarized_json responseFormat, speaker-labeled TranscriptionSegments, chunking_strategy + known_speaker_* options + validation). (CLAUDE.md says "Every feature, bug fix, or behavior change MUST include E2E test coverage." and "Add or update E2E tests — this is mandatory for any feature, bug fix, or behavior change"; see also the new-feature row in the E2E table and Pre-PR Quality Gate requiring pnpm --filter @tanstack/ai-e2e test:e2e. AGENTS.md and prior transcription PRs feat: extract @tanstack/openai-base and @tanstack/ai-utils packages #409/feat(ai-grok): audio, speech, and realtime adapters + example wiring #506 reviews establish the same convention: update feature-support.ts + test-matrix + fixture + spec.)

id: generateId(this.name),
model,
text: response.text,
duration: response.duration,
...(segments.length > 0 && { segments }),
}
}
if (useVerbose) {
const response = (await this.client.audio.transcriptions.create({
...request,

  1. responseFormat union literal duplicated (with added | 'diarized_json') across three locations instead of extracting a shared type. (CLAUDE.md says "Always look for repeated code or if the function you are trying to implement is already in another file" and "Review code at the end to see if you can make it more concise and less repetitive".)

ai/packages/ai/src/types.ts

Lines 1723 to 1732 in 05dfb53

confidence?: number
/** Speaker identifier, if diarization is enabled */
speaker?: string
}
/**
* A single word with timing information.
*/
export interface TranscriptionWord {
/** The transcribed word */

  1. Validation guards in the newly added validateDiarizationOptions (and caller guard) are inconsistent with modelOptions conventions and incomplete: camelCase cast for responseFormat inside modelOptions (while spread + all other fields use snake_case response_format/chunking_strategy/known_speaker_*); prompt rejection and diarization-options guard only inspect top-level (not modelOptions paths); chunking_strategy diarize-only restriction does not check modelOptions?.chunking_strategy. This allows bypasses leading to late 400s instead of early errors. (CLAUDE.md says "Don't create fallback code. It hides problems. Just display errors to the user".)

)
}
}
protected mapResponseFormat(
format?: OpenAITranscriptionResponseFormat,
): OpenAITranscriptionResponseFormat {
if (!format) return 'json'
return format
}
}
/**
* Creates an OpenAI transcription adapter with explicit API key.
* Type resolution happens here at the call site.
*
* @param model - The model name (e.g., 'whisper-1')
* @param apiKey - Your OpenAI API key
* @param config - Optional additional configuration
* @returns Configured OpenAI transcription adapter instance with resolved types
*
* @example
* ```typescript
* const adapter = createOpenaiTranscription('whisper-1', "sk-...");
*
* const result = await generateTranscription({
* adapter,
* audio: audioFile,
* language: 'en'
* });
* ```
*/

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tombeckenham tombeckenham self-assigned this Jun 5, 2026
- Removed `transcribe-provider-options.ts` file and integrated its options into `transcription-provider-options.ts`.
- Updated documentation to reflect changes in response formats, emphasizing the use of `modelOptions.response_format` for diarization.
- Enhanced the transcription adapter to handle new model options and response formats, including support for speaker diarization.
- Adjusted various components and tests to accommodate the new structure and ensure compatibility with the updated transcription features.
@8times4

8times4 commented Jun 12, 2026

Copy link
Copy Markdown
Author

Thanks for the review @tombeckenham, should be fixed now.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
packages/ai/skills/ai-core/media-generation/SKILL.md (1)

284-286: ⚡ Quick win

Clarify the diarization contract.

chunking_strategy: 'auto' reads like an optional default here, but the adapter enforces that setting for gpt-4o-transcribe-diarize. Please phrase it as required behavior, not a caller-tunable default.

♻️ Suggested wording
-For speaker diarization, use openaiTranscription('gpt-4o-transcribe-diarize').
-It defaults to modelOptions.response_format: 'diarized_json' and chunking_strategy: 'auto';
+For speaker diarization, use openaiTranscription('gpt-4o-transcribe-diarize').
+The adapter enforces modelOptions.response_format: 'diarized_json' and chunking_strategy: 'auto';
 do not pass prompt, include, or timestamp_granularities with this model.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ai/skills/ai-core/media-generation/SKILL.md` around lines 284 - 286,
The documentation wording implies chunking_strategy: 'auto' is an optional
default, but the adapter enforces that value for
openaiTranscription('gpt-4o-transcribe-diarize'); update the sentence in
SKILL.md to state that chunking_strategy must be 'auto' (required behavior)
rather than a caller-tunable default and keep the note that
modelOptions.response_format defaults to 'diarized_json' and callers must not
pass prompt, include, or timestamp_granularities when using
openaiTranscription('gpt-4o-transcribe-diarize').
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/ts-react-chat/src/lib/server-fns.ts`:
- Around line 84-86: The Zod enum used for transcription response formats
(TRANSCRIPTION_RESPONSE_FORMAT_SCHEMA) is missing 'diarized_json', causing
callers to be rejected before reaching generateTranscription; update the enum
used by both entrypoints to include 'diarized_json' (i.e., add the
'diarized_json' string value to TRANSCRIPTION_RESPONSE_FORMAT_SCHEMA and the
corresponding response-format validator used by the other entrypoint) so
diarized transcription requests validate successfully.

In `@packages/ai/src/types.ts`:
- Around line 1712-1717: The TranscriptionResponseFormat union is missing the
new 'diarized_json' member, causing type errors when callers set
TranscriptionOptions.responseFormat to that value; update the
TranscriptionResponseFormat type to include 'diarized_json' so the shared
contract matches the OpenAI provider, and ensure any other identical union (the
duplicate around the TranscriptionOptions declaration) is updated as well so
both TranscriptionResponseFormat and any repeated type declarations accept
'diarized_json'.

In `@testing/e2e/src/components/TranscriptionUI.tsx`:
- Around line 35-49: The default diarization E2E payload currently hardcodes
chunking_strategy ('chunking_strategy: "auto"') inside transcriptionInput ->
modelOptions when isDiarization is true; remove the chunking_strategy field from
transcriptionInput so the test covers the omitted-field/defaulting branch (leave
known_speaker_names and known_speaker_references as-is), and if you still want
explicit-option coverage add a separate test that constructs a
transcriptionInput with modelOptions.chunking_strategy = 'auto' to exercise the
passthrough path; update references to isDiarization, transcriptionInput, and
modelOptions accordingly.

In `@testing/e2e/src/lib/media-providers.ts`:
- Around line 99-108: The factory currently selects openaiTranscriptionModel
based on the optional feature param (in the openaiTranscriptionModel variable)
which can mismatch the actual transcription options; change
createOpenaiTranscription usage in the factories to derive the model from the
provided transcription options (e.g., inspect responseFormat and modelOptions
for diarization flags such as responseFormat === 'diarized_json' or
modelOptions.diarize) instead of relying on feature, or validate and reject when
diarization-specific options are present while feature !==
'transcription-diarization'; update the logic around openaiTranscriptionModel
and createOpenaiTranscription so diarization requests choose
'gpt-4o-transcribe-diarize' or fail fast.

---

Nitpick comments:
In `@packages/ai/skills/ai-core/media-generation/SKILL.md`:
- Around line 284-286: The documentation wording implies chunking_strategy:
'auto' is an optional default, but the adapter enforces that value for
openaiTranscription('gpt-4o-transcribe-diarize'); update the sentence in
SKILL.md to state that chunking_strategy must be 'auto' (required behavior)
rather than a caller-tunable default and keep the note that
modelOptions.response_format defaults to 'diarized_json' and callers must not
pass prompt, include, or timestamp_granularities when using
openaiTranscription('gpt-4o-transcribe-diarize').
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3121ef25-add2-4697-a74d-2dbfb49daa47

📥 Commits

Reviewing files that changed from the base of the PR and between fbb57a0 and c7cf3fc.

📒 Files selected for processing (30)
  • docs/adapters/openai.md
  • docs/comparison/vercel-ai-sdk.md
  • docs/media/generation-hooks.md
  • docs/media/transcription.md
  • examples/ts-react-chat/src/lib/audio-providers.ts
  • examples/ts-react-chat/src/lib/server-audio-adapters.ts
  • examples/ts-react-chat/src/lib/server-fns.ts
  • examples/ts-react-chat/src/routes/api.transcribe.ts
  • examples/ts-react-chat/src/routes/generations.transcription.tsx
  • knip.json
  • packages/ai-client/src/generation-types.ts
  • packages/ai-openai/src/adapters/transcription.ts
  • packages/ai-openai/src/audio/transcribe-provider-options.ts
  • packages/ai-openai/src/audio/transcription-provider-options.ts
  • packages/ai-openai/tests/transcription-adapter.test.ts
  • packages/ai/skills/ai-core/media-generation/SKILL.md
  • packages/ai/src/activities/generateTranscription/index.ts
  • packages/ai/src/types.ts
  • testing/e2e/fixtures/transcription/basic.json
  • testing/e2e/fixtures/transcription/diarization.json
  • testing/e2e/src/components/TranscriptionUI.tsx
  • testing/e2e/src/lib/feature-support.ts
  • testing/e2e/src/lib/features.ts
  • testing/e2e/src/lib/media-providers.ts
  • testing/e2e/src/lib/server-functions.ts
  • testing/e2e/src/lib/types.ts
  • testing/e2e/src/routes/$provider/$feature.tsx
  • testing/e2e/src/routes/api.transcription.stream.ts
  • testing/e2e/src/routes/api.transcription.ts
  • testing/e2e/tests/transcription.spec.ts
💤 Files with no reviewable changes (2)
  • knip.json
  • packages/ai-openai/src/audio/transcribe-provider-options.ts
✅ Files skipped from review due to trivial changes (5)
  • testing/e2e/src/lib/types.ts
  • docs/comparison/vercel-ai-sdk.md
  • testing/e2e/fixtures/transcription/diarization.json
  • docs/media/generation-hooks.md
  • docs/media/transcription.md
🚧 Files skipped from review as they are similar to previous changes (5)
  • packages/ai/src/activities/generateTranscription/index.ts
  • packages/ai-client/src/generation-types.ts
  • docs/adapters/openai.md
  • packages/ai-openai/tests/transcription-adapter.test.ts
  • packages/ai-openai/src/adapters/transcription.ts

Comment thread examples/ts-react-chat/src/lib/server-fns.ts
Comment thread packages/ai/src/types.ts
Comment on lines +35 to +49
const isDiarization = feature === 'transcription-diarization'
const transcriptionInput: TranscriptionGenerateInput = {
audio: TEST_AUDIO_BASE64,
language: 'en',
...(isDiarization
? {
modelOptions: {
response_format: 'diarized_json',
chunking_strategy: 'auto',
known_speaker_names: ['agent', 'customer'],
known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64],
},
}
: {}),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Leave chunking_strategy out of the default diarization E2E payload.

Line 43 hardcodes chunking_strategy: 'auto', so the new Playwright flow only proves the explicit-option path. The omitted-field/defaulting branch can regress without any of the three diarization E2E modes failing. I’d make the default test payload minimal and add a separate explicit-option case only if you still want passthrough coverage.

Suggested change
   const transcriptionInput: TranscriptionGenerateInput = {
     audio: TEST_AUDIO_BASE64,
     language: 'en',
     ...(isDiarization
       ? {
           modelOptions: {
             response_format: 'diarized_json',
-            chunking_strategy: 'auto',
             known_speaker_names: ['agent', 'customer'],
             known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64],
           },
         }
       : {}),
   }

As per coding guidelines, testing/e2e/**/*.spec.ts: every feature, bug fix, or behavior change must include E2E coverage using Playwright + aimock.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const isDiarization = feature === 'transcription-diarization'
const transcriptionInput: TranscriptionGenerateInput = {
audio: TEST_AUDIO_BASE64,
language: 'en',
...(isDiarization
? {
modelOptions: {
response_format: 'diarized_json',
chunking_strategy: 'auto',
known_speaker_names: ['agent', 'customer'],
known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64],
},
}
: {}),
}
const isDiarization = feature === 'transcription-diarization'
const transcriptionInput: TranscriptionGenerateInput = {
audio: TEST_AUDIO_BASE64,
language: 'en',
...(isDiarization
? {
modelOptions: {
response_format: 'diarized_json',
known_speaker_names: ['agent', 'customer'],
known_speaker_references: [TEST_AUDIO_BASE64, TEST_AUDIO_BASE64],
},
}
: {}),
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@testing/e2e/src/components/TranscriptionUI.tsx` around lines 35 - 49, The
default diarization E2E payload currently hardcodes chunking_strategy
('chunking_strategy: "auto"') inside transcriptionInput -> modelOptions when
isDiarization is true; remove the chunking_strategy field from
transcriptionInput so the test covers the omitted-field/defaulting branch (leave
known_speaker_names and known_speaker_references as-is), and if you still want
explicit-option coverage add a separate test that constructs a
transcriptionInput with modelOptions.chunking_strategy = 'auto' to exercise the
passthrough path; update references to isDiarization, transcriptionInput, and
modelOptions accordingly.

Source: Coding guidelines

Comment thread testing/e2e/src/lib/media-providers.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@testing/e2e/src/lib/media-providers.ts`:
- Around line 42-50: The code currently lets an internal flag
modelOptions.diarize flow into the OpenAI SDK; update the transcription request
construction to strip the diarize property before spreading modelOptions into
the SDK call—e.g., in the OpenAI transcription adapter where the request is
built, clone modelOptions and delete or omit the diarize key (while still using
getOpenaiTranscriptionModel(...) for detection), then spread the sanitized
object (e.g., sanitizedModelOptions) into request: { model, file,
...sanitizedModelOptions } so diarize is never sent upstream.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 61c5ddd3-e650-49bd-b749-865a32276717

📥 Commits

Reviewing files that changed from the base of the PR and between c7cf3fc and 58aa20c.

📒 Files selected for processing (8)
  • examples/ts-react-chat/src/lib/server-fns.ts
  • examples/ts-react-chat/src/routes/api.transcribe.ts
  • packages/ai-openai/src/adapters/transcription.ts
  • packages/ai/src/types.ts
  • testing/e2e/src/lib/media-providers.ts
  • testing/e2e/src/lib/server-functions.ts
  • testing/e2e/src/routes/api.transcription.stream.ts
  • testing/e2e/src/routes/api.transcription.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • examples/ts-react-chat/src/routes/api.transcribe.ts
  • testing/e2e/src/lib/server-functions.ts
  • packages/ai-openai/src/adapters/transcription.ts
  • examples/ts-react-chat/src/lib/server-fns.ts

Comment on lines +42 to +50
function getOpenaiTranscriptionModel(options: TranscriptionAdapterOptions) {
const modelOptions = options.modelOptions
const isDiarizationRequest =
options.responseFormat === 'diarized_json' ||
modelOptions?.response_format === 'diarized_json' ||
modelOptions?.diarize === true ||
modelOptions?.chunking_strategy !== undefined ||
modelOptions?.known_speaker_names !== undefined ||
modelOptions?.known_speaker_references !== undefined

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Find where diarize is populated in E2E request construction"
rg -n --type=ts -C2 '\bdiarize\b|response_format|chunking_strategy|known_speaker_' testing/e2e/src

echo
echo "2) Confirm modelOptions is forwarded from API routes"
rg -n --type=ts -C3 'generateTranscription\(|modelOptions,' \
  testing/e2e/src/routes/api.transcription.ts \
  testing/e2e/src/routes/api.transcription.stream.ts

echo
echo "3) Confirm OpenAI adapter request spread behavior"
rg -n --type=ts -C5 '\.\.\.modelOptions|TranscriptionCreateParamsNonStreaming|request\.response_format' \
  packages/ai-openai/src/adapters/transcription.ts

Repository: TanStack/ai

Length of output: 5925


Prevent internal modelOptions.diarize from reaching the OpenAI SDK

  • Current E2E payloads don’t set modelOptions.diarize (they use response_format: 'diarized_json', chunking_strategy, and known_speaker_*), and modelOptions is forwarded unchanged by both transcription routes.
  • The OpenAI adapter still spreads ...modelOptions into the SDK request (request: { model, file, ...modelOptions }), so if any caller ever adds modelOptions.diarize, it would be sent upstream as an unsupported parameter—omit diarize before building the request.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@testing/e2e/src/lib/media-providers.ts` around lines 42 - 50, The code
currently lets an internal flag modelOptions.diarize flow into the OpenAI SDK;
update the transcription request construction to strip the diarize property
before spreading modelOptions into the SDK call—e.g., in the OpenAI
transcription adapter where the request is built, clone modelOptions and delete
or omit the diarize key (while still using getOpenaiTranscriptionModel(...) for
detection), then spread the sanitized object (e.g., sanitizedModelOptions) into
request: { model, file, ...sanitizedModelOptions } so diarize is never sent
upstream.

@tombeckenham tombeckenham left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough follow-up here — the E2E coverage across all three transports, the openai-diarize option on the example page with speaker-labeled segments, the shared TranscriptionResponseFormat extraction, and the much more complete validateDiarizationOptions all address my earlier review. The response-mode discriminant ('diarized' | 'verbose' | 'plain') + request-plan refactor is a genuine improvement. 🙏

I'm requesting changes on one architectural point, plus the small cleanups it implies.

Requested change — keep diarized_json out of the shared union

diarized_json was added to the shared, cross-provider TranscriptionResponseFormat (packages/ai/src/types.ts:1712-1718). The problem: only the OpenAI adapter reads the top-level responseFormat. The other transcription adapters drive diarization from a boolean instead and ignore responseFormat entirely:

  • packages/ai-elevenlabs/src/adapters/transcription.tsmodelOptions.diarize: boolean
  • packages/ai-grok/src/adapters/transcription.tsmodelOptions.diarize: boolean
  • packages/ai-fal/... → no diarization support

So after this change, responseFormat: 'diarized_json' type-checks for every provider but is silently ignored by three of them. diarized_json isn't a portable format — it's OpenAI's wire-format literal for one model (gpt-4o-transcribe-diarize). The shared union should only advertise formats a generic caller can actually request.

Please keep diarized_json in OpenAITranscriptionResponseFormat only and revert the shared TranscriptionResponseFormat to the portable set ('json' | 'text' | 'srt' | 'verbose_json' | 'vtt'). Diarization on OpenAI is already driven through modelOptions.response_format: 'diarized_json' (that's what the example and E2E use), so nothing in the feature path is lost.

Two cleanups fall out of this and should land together:

  1. OpenAITranscriptionResponseFormat = TranscriptionResponseFormat | 'diarized_json' (packages/ai-openai/src/audio/transcription-provider-options.ts:4-6) is currently redundantdiarized_json is already in the shared union, so it collapses to exactly TranscriptionResponseFormat. Once diarized_json moves out, this alias becomes a real extension again. ✅
  2. const topLevelResponseFormat = responseFormat as OpenAITranscriptionResponseFormat | undefined (packages/ai-openai/src/adapters/transcription.ts:253-255) is a no-op cast today and becomes a safe widening after the change — drop the as and assign directly.

Direction we want — a portable diarize on/off

For where this should go cross-provider: the portable concept is diarization on/off, not a format string. And the output is already normalized — TranscriptionSegment.speaker?: string exists in the shared type and ElevenLabs already populates it. So the clean cross-provider surface is a top-level diarize?: boolean on TranscriptionOptions that each adapter maps to its own mechanism (OpenAI → diarize model + diarized_json; ElevenLabs/Grok → diarize: true), with results unified via segment.speaker.

I don't want to balloon this PR's scope. Either is fine with me:

  • add a top-level diarize?: boolean wired for OpenAI only in this PR (ElevenLabs/Grok wiring as a follow-up), or
  • keep this PR OpenAI-scoped via modelOptions and I'll open a follow-up issue for the portable flag.

Let me know which you'd prefer. (Note for whoever does the portable work: Grok currently types its speaker as number while the shared segment.speaker is string — that needs reconciling.)

Minor

  • Validation gap (packages/ai-openai/src/adapters/transcription.ts:458-467): the matching-length check only fires when both known_speaker_names and known_speaker_references are present. A lone array (one provided without the other) passes local validation and defers to a late opaque 400 — the exact thing this validator exists to prevent. Please reject early when exactly one of the two is provided.
  • Dead branch (testing/e2e/src/lib/media-providers.ts:47): modelOptions?.diarize === true is never set by any caller (the real switch is response_format: 'diarized_json'), and diarize isn't a valid OpenAI param. Please drop that clause so model selection is driven only by real fields.

Blocking: the shared-union scoping + the two cleanups it implies. The rest is quick polish. Thanks again!

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants